The Similarity Index

نویسنده

  • M. Scott Roth
چکیده

Is it possible to calculate a hash value for a document that captures its salient characteristics, such that a repository can be queried for like values and retrieve all “similar” documents? If so, similar documents could be easily identified by a simple SQL query without the need for a full text search engine. Such a value would allow systems to quickly identify duplicate or similar content before it is checked into a repository, introduced to an index, or returned in a query result. Additionally, this value could assist with identifying other content a user might be interested in, though they did not explicitly query for it. This paper endeavors to answer this question by exploring the corpus of existing research in this and related areas, and reporting the results of experimentation. This investigation was conducted with the intent of implementing such a solution in a Documentum environment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

New distance and similarity measures for hesitant fuzzy soft sets

The hesitant fuzzy soft set (HFSS), as a combination of hesitant fuzzy and soft sets, is regarded as a useful tool for dealing with the uncertainty and ambiguity of real-world problems. In HFSSs, each element is defined in terms of several parameters with arbitrary membership degrees. In addition, distance and similarity measures are considered as the important tools in different areas such as ...

متن کامل

A Novel Image Structural Similarity Index Considering Image Content Detectability Using Maximally Stable Extremal Region Descriptor

The image content detectability and image structure preservation are closely related concepts with undeniable role in image quality assessment. However, the most attention of image quality studies has been paid to image structure evaluation, few of them focused on image content detectability. Examining the image structure was firstly introduced and assessed in Structural SIMilarity (SSIM) measu...

متن کامل

Determining specific species and the species contribution in the similarity between soil seed bank and standing vegetation Case study: Lazour rangeland- Firouzkooh

Determining the potential of soil seed bank and its specific species is important for conservation goals and vegetation restoration of rangelands. In this study, the characteristics of soil seed bank and standing vegetation in Lazour mountain rangeland were investigated in order to estimate the rehabilitation ability of the study area in case of possible disturbances. In order to determine the ...

متن کامل

مقایسه شاخص‌های خشک‌سالی هواشناسی در استان یزد

In this research, 5 percent of normal Precipitation Index (PNPI),Deciles of Precipitation(DPI),Rainfall Anomaly Index (RAI), Bahlme & Mooley Drought Index (BMDI) and standardized Precipitation Index (SPI) were used in order to investigate drought in Yazd synoptic station and 31 non synoptic stations all around this province. For this purpose, the present statistical errors were reconstructed vi...

متن کامل

رتبه‌بندی و مقایسه شهرستان‌های استان لرستان در بخش بهداشت و خدمات بهداشتی با استفاده از روش TOPSIS

Background: In spite of the great importance of health and health services, the imbalance in distribution of such services has always been one of the main problems of planners. This research was carried out with the aim of ranking and comparing health and health services in cities in Lorestan province. Materials and Methods: Data was collected from books and documents, and from experts in the ...

متن کامل

Fingerprinting and genetic diversity evaluation of rice cultivars using Inter Simple Sequence Repeat marker

Rice as one of the most important agricultural crops has a putative potential for ensuring food security and addressing poverty in the world. In the present study, in order to provide basic information to improve rice through breeding programs, Inter Simple Sequence Repeat marker (ISSR) was used For DNA fingerprinting and finding genetic relationships among 32 different cultivars. In this study...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011